Crate serde_arrow
source ·Expand description
serde_arrow
- convert sequences Rust objects to / from arrow arrays
The arrow in-memory format is a powerful way to work with data frame like structures. However, the API of the underlying Rust crates can be at times cumbersome to use due to the statically typed nature of Rust.
serde_arrow
, offers a simple way to convert Rust objects into Arrow arrays
and back. serde_arrow
relies on the Serde package to
interpret Rust objects. Therefore, adding support for serde_arrow
to
custom types is as easy as using Serde’s derive macros.
In the Rust ecosystem there are two competing implemenetations of the arrow
in-memory format. serde_arrow
supports both arrow
and
arrow2
for schema tracing and serialization from Rust structs to
arrays. Deserialization from arrays to Rust structs is currently only
implemented for arrow2
.
Overview
The functions come in pairs: some work on single arrays, i.e., the series of a data frames, some work on multiples arrays, i.e., data frames themselves.
implementation | operation | mutliple arrays | single array |
---|---|---|---|
arrow | schema tracing | arrow::serialize_into_fields | arrow::serialize_into_field |
Rust to Arrow | arrow::serialize_into_arrays | arrow::serialize_into_array | |
Arrow to Rust | not supported | not supported | |
Builder | arrow::ArraysBuilder | arrow::ArrayBuilder | |
arrow2 | schema tracing | arrow2::serialize_into_fields | arrow2::serialize_into_field |
Rust to Arrow | arrow2::serialize_into_arrays | arrow2::serialize_into_array | |
Arrow to Rust | arrow2::deserialize_from_arrays | arrow2::deserialize_from_array | |
Builder | arrow2::ArraysBuilder | arrow2::ArrayBuilder |
Functions working on multiple arrays expect sequences of records in Rust, e.g., a vector of structs. Functions working on single arrays expect vectors of arrays elements.
Example
Requires one of arrow2
feature (see below).
use serde_arrow::{
schema::TracingOptions,
arrow2::{serialize_into_fields, serialize_into_arrays}
};
#[derive(Serialize)]
struct Example {
a: f32,
b: i32,
}
let records = vec![
Example { a: 1.0, b: 1 },
Example { a: 2.0, b: 2 },
Example { a: 3.0, b: 3 },
];
// Auto-detect the arrow types. Result may need to be overwritten and
// customized, see serde_arrow::schema::Strategy for details.
let fields = serialize_into_fields(&records, TracingOptions::default())?;
let arrays = serialize_into_arrays(&fields, &records)?;
The generated arrays can then be written to disk, e.g., as parquet, and loaded in another system.
use arrow2::{chunk::Chunk, datatypes::Schema};
// see https://jorgecarleitao.github.io/arrow2/io/parquet_write.html
write_chunk(
"example.pq",
Schema::from(fields),
Chunk::new(arrays),
)?;
See also:
- the quickstart guide for more examples of how to use this package
- the implementation notes for an explanation of how this package works and its underlying data model
- the status summary for an overview over the supported Arrow and Rust constructs
Features:
Which version of arrow
or arrow2
is used can be selected via features.
Per default no arrow implementation is used. In that case only the base
features of serde_arrow
are availble.
The arrow-*
and arrow2-*
feature groupss are comptaible with each other.
I.e., it is possible to use arrow
and arrow2
together. Within each group
the highest version is selected, if multiple features are activated. E.g,
when selecting arrow2-0-16
and arrow2-0-17
, arrow2=0.17
will be used.
Available features:
Feature | Arrow Version |
---|---|
arrow-39 | arrow=39 |
arrow-38 | arrow=38 |
arrow-37 | arrow=37 |
arrow-36 | arrow=36 |
arrow-35 | arrow=35 |
arrow2-0-17 | arrow2=0.17 |
arrow2-0-16 | arrow2=0.16 |
Modules
- Internal. Do not use
- Support for the
arrow
crate (requires one thearrow-*
features) - Support for the
arrow2
crate (requires one thearrow2-*
features) - The basic machinery powering
serde_arrow
- Experimental functionality that is not bound by semver compatibility
- Helpers to configure how Arrow and Rust types are translated into one another
Enums
- Common errors during
serde_arrow
’s usage
Type Definitions
- A Result type that defaults to
serde_arrow
’s Error type